远程编程机器人执行任务通常依赖于在机器人环境中注册感兴趣的对象。这些任务通常涉及阐明物体,例如打开或关闭阀门。但是,现有的注册对象的人类在循环方法中不考虑发音和对象几何形状的相应影响,这可能导致方法失败。在这项工作中,我们提出了一种方法,其中注册系统尝试使用非线性拟合和迭代性最接近点算法来自动确定用户选择点的对象模型,姿势和表达。当拟合不正确时,操作员可以迭代干预校正,然后系统将重新装置对象。我们介绍了具有反击关节的一种自由度(DOF)对象的拟合程序的实施,并通过用户研究对其进行评估,该用户研究表明,它可以改善用户的性能,在任务和任务负载的时间范围内,易于与手动注册方法相比,使用和有用性。我们还提出了一个示例,该示例将我们的方法集成到一个端到端系统中,以阐明远程阀。
translated by 谷歌翻译
To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e.g., motion capture, sport analysis) and robustness to single-view ambiguities. Existing solutions typically suffer from poor generalization performance to new settings, largely due to the limited diversity of image-mesh pairs in multi-view training data. To address this shortcoming, people have explored the use of synthetic images. But besides the usual impact of visual gap between rendered and target data, synthetic-data-driven multi-view estimators also suffer from overfitting to the camera viewpoint distribution sampled during training which usually differs from real-world distributions. Tackling both challenges, we propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space to remove ambiguities in 2D representations. Through extensive benchmarking, we demonstrate the superiority of the proposed solution especially for unseen in-the-wild scenarios.
translated by 谷歌翻译
Multilevel Stein variational gradient descent is a method for particle-based variational inference that leverages hierarchies of approximations of target distributions with varying costs and fidelity to computationally speed up inference. This work provides a cost complexity analysis of multilevel Stein variational gradient descent that applies under milder conditions than previous results, especially in discrete-in-time regimes and beyond the limited settings where Stein variational gradient descent achieves exponentially fast convergence. The analysis shows that the convergence rate of Stein variational gradient descent enters only as a constant factor for the cost complexity of the multilevel version, which means that the costs of the multilevel version scale independently of the convergence rate of Stein variational gradient descent on a single level. Numerical experiments with Bayesian inverse problems of inferring discretized basal sliding coefficient fields of the Arolla glacier ice demonstrate that multilevel Stein variational gradient descent achieves orders of magnitude speedups compared to its single-level version.
translated by 谷歌翻译
Many visualization techniques have been created to help explain the behavior of convolutional neural networks (CNNs), but they largely consist of static diagrams that convey limited information. Interactive visualizations can provide more rich insights and allow users to more easily explore a model's behavior; however, they are typically not easily reusable and are specific to a particular model. We introduce Visual Feature Search, a novel interactive visualization that is generalizable to any CNN and can easily be incorporated into a researcher's workflow. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar CNN features. It supports searching through large image datasets with an efficient cache-based search implementation. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on supervised, self-supervised, and human-edited CNNs. We also release a portable Python library and several IPython notebooks to enable researchers to easily use our tool in their own experiments. Our code can be found at https://github.com/lookingglasslab/VisualFeatureSearch.
translated by 谷歌翻译
Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, due to their specific transformation type, image appearance, annotation type, etc. In this paper, we propose an innovative deep learning-based Projective Transformation Rectification Network (PTRN) to automatically rectify CXR photos by predicting the projective transformation matrix. To the best of our knowledge, it is the first work to predict the projective transformation matrix as the learning goal for photo rectification. Additionally, to avoid the expensive collection of natural data, synthetic CXR photos are generated under the consideration of natural perturbations, extra screens, etc. We evaluate the proposed approach in the CheXphoto smartphone-captured CXR photos classification competition hosted by the Stanford University Machine Learning Group, our approach won first place with a huge performance improvement (ours 0.850, second-best 0.762, in AUC). A deeper study demonstrates that the use of PTRN successfully achieves the classification performance on the spatially transformed CXR photos to the same level as on the high-quality digital CXR images, indicating PTRN can eliminate all negative impacts of projective transformation on the CXR photos.
translated by 谷歌翻译
了解动态场景中的3D运动对于许多视觉应用至关重要。最近的进步主要集中在估计人类等某些特定元素的活动上。在本文中,我们利用神经运动场来估计多视图设置中所有点的运动。由于颜色相似的点和与时变颜色的点的歧义,从动态场景中对动态场景进行建模运动是具有挑战性的。我们建议将估计运动的正规化为可预测。如果已知来自以前的帧的运动,那么在不久的将来的运动应该是可以预测的。因此,我们通过首先调节潜在嵌入的估计运动来引入可预测性正则化,然后通过采用预测网络来在嵌入式上执行可预测性。所提出的框架pref(可预测性正则化字段)比基于最先进的神经运动场的动态场景表示方法在PAR或更好的结果上取得了更好的成绩,同时不需要对场景的先验知识。
translated by 谷歌翻译
联合学习(FL)是一种机器学习范式,本地节点在培训数据保持分散时进行了协作训练中心模型。现有的FL方法通常共享模型参数或采用共同依据来解决不平衡数据分布的问题。但是,他们患有沟通瓶颈。更重要的是,他们有隐私泄漏的风险。在这项工作中,我们在FL框架中开发了一种隐私和沟通高效方法,并使用未标记的跨域公共数据进行单次离线知识蒸馏。我们提出了一个量化的和嘈杂的本地预测合奏,从经过全面训练的本地模型中,以确保更强的隐私保证而无需牺牲准确性。基于有关图像分类和文本分类任务的广泛实验,我们表明,我们的隐私方法优于基线FL算法,其精度和沟通效率都具有出色的性能。
translated by 谷歌翻译
全面监督的人类网格恢复方法是渴望数据的,由于3D规定基准数据集的可用性有限和多样性,因此具有较差的概括性。使用合成数据驱动的训练范例,已经从合成配对的2D表示(例如2D关键点和分段掩码)和3D网格中训练了模型的最新进展,其中已使用合成数据驱动的训练范例和3D网格进行了训练。但是,由于合成训练数据和实际测试数据之间的域间隙很难解决2D密集表示,因此很少探索合成密集的对应图(即IUV)。为了减轻IUV上的这个领域差距,我们提出了使用可靠但稀疏表示的互补信息(2D关键点)提出的交叉代理对齐。具体而言,初始网格估计和两个2D表示之间的比对误差将转发为回归器,并在以下网格回归中动态校正。这种适应性的交叉代理对准明确地从偏差和捕获互补信息中学习:从稀疏的表示和浓郁的浓度中的稳健性。我们对多个标准基准数据集进行了广泛的实验,并展示了竞争结果,帮助减少在人类网格估计中生产最新模型所需的注释工作。
translated by 谷歌翻译
我们研究了从类别理论的数学字段中的英语文本中提取数学实体的不同系统,作为构建数学知识图的第一步。我们考虑四个不同的术语提取器,并比较它们的结果。这个小实验展示了从嘈杂域文本中提取的术语的构建和评估的一些问题。我们还提供了研究数学的两个开放语料库,尤其是类别理论:一小部分来自TAC期刊(3188个句子)的摘要,以及来自NLAB社区Wiki(15,000个句子)的较大语料库。
translated by 谷歌翻译
本文考虑了最佳功率流(OPF)的优化代理,即近似于OPF的输入/输出关系的机器学习模型。最近的工作重点是表明此类代理可能具有高忠诚。但是,他们的培训需要大量数据,每个实例都需要(离线)解决输入分布样本的OPF。为了满足市场清除应用程序的要求,本文提出了积极的桶装采样(ABS),这是一个新型的活跃学习框架,旨在培训在一个时间限制内培训最佳OPF代理。ABS将输入分布分配到存储桶中,并使用采集函数来确定接下来的何处。它依靠自适应学习率,随着时间的推移会增加和降低。实验结果证明了ABS的好处。
translated by 谷歌翻译